Author Clustering using Hierarchical Clustering Analysis
نویسندگان
چکیده
This paper presents our approach to the Author Clustering task at PAN 2017. We performed a hierarchical clustering analysis of different document features: typed and untyped character n-grams, and word n-grams. We experimented with two feature representation methods, log-entropy model, and tf-idf; while tuning minimum frequency threshold values to reduce the dimensionality. Our system was ranked 1 in both subtasks, author clustering and authorship-link ranking.
منابع مشابه
Determination of the Best Hierarchical Clustering Method for Regional Analysis of Base Flow Index in Kerman Province Catchments
The lack of complete coverage of hydrological data forces hydrologists to use the homogenization methods in regional analysis. In this research, in order to choose the best Hierarchical clustering method for regional analysis, base flow and related index were extracted from daily stream flow data using two parameter recursive digital filters in 43 hydrometric stations of the Kerman province. Ph...
متن کاملUsing Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors
Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...
متن کاملبه کارگیری روشهای خوشهبندی در ریزآرایه DNA
Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملGraph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کامل